-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request metrics improvements #8420
Conversation
Update groupLabel to use a type definition Include effective Latency and Average Speed in the metrics Include parallel and sequential request counts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added my comments, wherever I have an idea :)
} | ||
|
||
const urlData = Object.entries(groupData.urls); | ||
const sumLatency = urlData.reduce((result, url) => (result + (url[1].metrics?.latency ?? 0)), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we discard 0
latency from equation, ideally it should never happen though right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should never happen really, the 0 here is forced to be type safe.
app/client/rest/tracking.ts
Outdated
const effectiveParallelLatency = Math.max(...parallelLatencies, 0); | ||
const effectiveSequentialLatency = sequentialLatencies.reduce((sum, latency) => sum + latency, 0); | ||
|
||
const effectiveLatency = effectiveParallelLatency + effectiveSequentialLatency; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be summed or should we get the avg especially for sequential latency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I'm aware, for parallel requests the latency is the maximum latency of all requests and then sequential requests do add up to get the effective latency as if you execute two sequential requests and one takes 3s and the other one 5s then you waited a total of 8s.
that is the reason why I added the avg latency overall.
const averageSpeedMbps = averageSpeedBps / 1_000_000; // Convert to Mbps | ||
|
||
return { | ||
averageSpeedMbps, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can use avg speed to decide whether download files or whatsoever.
performance.mark('tti', options); | ||
} | ||
|
||
public measureTimeToInteraction() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay, long anticipated metric is here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but is nothing more than an output to the console at the moment.. I need to figure out where and how to include it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I create another metric on the server to collect this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rahimrahman can it be optional? I think that'll be very nice to have but haven't checked into how to include it as is a totally different metric collected in a different way and somewhere else in the code
app/client/rest/tracking.ts
Outdated
categorizeRequests(groupLabel: RequestGroupLabel): CategorizedRequests { | ||
const parallel: ClientResponseMetrics[] = []; | ||
const sequential: ClientResponseMetrics[] = []; | ||
const groupData = this.requestGroups.get(groupLabel); | ||
if (!groupData) { | ||
return {parallel, sequential}; | ||
} | ||
|
||
// Sort requests by start time | ||
const requestsMetrics = Object.entries(groupData.urls).map((e) => e[1].metrics).filter((m) => m != null); | ||
requestsMetrics.sort((a, b) => a.startTime - b.startTime); | ||
|
||
let lastEndTime = 0; | ||
|
||
for (const metrics of requestsMetrics) { | ||
if (metrics.startTime < lastEndTime) { | ||
// Overlapping request -> Parallel | ||
parallel.push(metrics); | ||
} else { | ||
// Non-overlapping request -> Sequential | ||
sequential.push(metrics); | ||
} | ||
|
||
// Update the last end time | ||
lastEndTime = Math.max(lastEndTime, metrics.endTime); | ||
} | ||
|
||
return {parallel, sequential}; | ||
} | ||
|
||
calculateAverageSpeedWithCategories = ( | ||
categorizedRequests: CategorizedRequests, | ||
elapsedTimeInSeconds: number, // Observed total elapsed time in seconds | ||
): { averageSpeedMbps: number; effectiveLatency: number } => { | ||
const {parallel, sequential} = categorizedRequests; | ||
|
||
// Step 1: Calculate total data size in bits | ||
const totalDataBits = [...parallel, ...sequential].reduce((sum, req) => sum + (req.compressedSize * 8), 0); | ||
|
||
// Step 2: Calculate effective latency | ||
const parallelLatencies = parallel.map((req) => req.latency); | ||
const sequentialLatencies = sequential.map((req) => req.latency); | ||
|
||
const effectiveParallelLatency = Math.max(...parallelLatencies, 0); | ||
const effectiveSequentialLatency = sequentialLatencies.reduce((sum, latency) => sum + latency, 0); | ||
|
||
const effectiveLatency = effectiveParallelLatency + effectiveSequentialLatency; | ||
|
||
// Step 3: Calculate data transfer time | ||
const dataTransferTime = elapsedTimeInSeconds - (effectiveLatency / 1000); | ||
|
||
// Handle edge case: if data transfer time is zero or negative | ||
if (dataTransferTime <= 0) { | ||
return {averageSpeedMbps: 0, effectiveLatency: 0}; | ||
} | ||
|
||
// Step 4: Calculate average speed | ||
const averageSpeedBps = totalDataBits / dataTransferTime; // Speed in bps | ||
const averageSpeedMbps = averageSpeedBps / 1_000_000; // Convert to Mbps | ||
|
||
return { | ||
averageSpeedMbps, | ||
effectiveLatency, | ||
}; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a significant proposed improvement based on data observation. I will try to explain as much as possible in this PR: #8439.
* fix: improve parallel requests calculation * fix test issue * multiple parallelGroups testing * calculateAverageSpeedWithCategories test * categorizedRequests in it's own describe * clean up duplicate tests * check for when groupLabel is non-existent * a case when groupLabel is non-existent * more testing with effective latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've spent quite a bit of time updating the tracking.ts in order to send telemetry data to the server. At this point, I think it is safe to say that it's been "tested" based on the changes and work that I've done in my 2 PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. A few comments here and there, but nothing big.
app/actions/local/channel.ts
Outdated
@@ -237,32 +239,51 @@ export async function resetMessageCount(serverUrl: string, channelId: string) { | |||
} | |||
} | |||
|
|||
export async function storeMyChannelsForTeam(serverUrl: string, teamId: string, channels: Channel[], memberships: ChannelMembership[], prepareRecordsOnly = false, isCRTEnabled?: boolean) { | |||
const resolveAndflattenModels = async (operator: ServerDataOperator, modelPromises: Array<Promise<Model[]>>, description: string, prepareRecordsOnly: boolean) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const resolveAndflattenModels = async (operator: ServerDataOperator, modelPromises: Array<Promise<Model[]>>, description: string, prepareRecordsOnly: boolean) => { | |
const resolveAndFlattenModels = async (operator: ServerDataOperator, modelPromises: Array<Promise<Model[]>>, description: string, prepareRecordsOnly: boolean) => { |
app/actions/local/channel.ts
Outdated
if (prepareRecordsOnly) { | ||
return {models: flattenedModels}; | ||
} | ||
return resolveAndflattenModels(operator, modelPromises, 'storeAllMyChannels', prepareRecordsOnly); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always struggle a bit with how JavaScript handles this.
At this point, we are returning the promise, and therefore, exiting the try/catch
block. So... if a exception occur inside the promise, the exception will not be handled here.
So... in this particular case, the catch will only catch the first promise resolution of prepareAllMyChannels
, but not the resolution of resolveAndFlattenModels
or the resolution of prepareChannels
inside prepareAllMyChannels
.
I think the linter understands you are inside a try/catch and allows you to do this:
return resolveAndflattenModels(operator, modelPromises, 'storeAllMyChannels', prepareRecordsOnly); | |
return await resolveAndflattenModels(operator, modelPromises, 'storeAllMyChannels', prepareRecordsOnly); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but would it get caught by the caller? In this case /remote/channel.ts has a try-catch block there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've confirmed that the await is not needed due to try..catch on the caller catching this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But should the caller be catching it? Shouldn't be this function the one catching it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point being is, there's try..catch on the callers. So errors within the promise will get bubbled up and caught by the callers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at the function, there is nothing that shows that it is going to throw. On the contrary, having the whole body of the function inside a try catch block should be an indication that you don't expect this function to throw.
If the callers of this function have try catch, great, it won't fail. But that doesn't mean this is not wrong. Not only that, we will lose part of the context of what is failing, since the catch will be higher up, instead of here, where it will clearly log that the the problem is specifically in storeAllMyChannels
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a new commit to add a try..catch in resolveAndFlattenModels. So if any of the models fail for whatever reason, we'll get a more isolated error output.
app/actions/local/channel.ts
Outdated
|
||
return {models: flattenedModels}; | ||
return resolveAndflattenModels(operator, modelPromises, 'storeMyChannelsForTeam', prepareRecordsOnly); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about returning promises
return resolveAndflattenModels(operator, modelPromises, 'storeMyChannelsForTeam', prepareRecordsOnly); | |
return await resolveAndflattenModels(operator, modelPromises, 'storeMyChannelsForTeam', prepareRecordsOnly); |
for (const teamId of teamIdsSet) { | ||
categoriesPromises.push(client.getCategories('me', teamId, groupLabel)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have an absurd amount of teams, we are flooding the network layer with get categories calls. It is not common, but we have already had reports of people having 100+ teams.
Unless we create a priority queue down at the network layer, we may want to consider batching this, so other processes can access the network layer before this finishes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Thanks for your observation. This is something this PR/changes is trying to establish on where the bottlenecks are and learn from it. And prioritize all the issues. We have a long way to go, but first, data.
for (let i = 1; i <= remaining; i++) { | ||
remainingMembershipsPromises.push(client.getAllMyChannelMembersFromAllTeams(i, General.CHANNEL_MEMBERS_CHUNK_SIZE, groupLabel)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This for loop adds to the problem in my previous comment.
const tabletDevice = isTablet(); | ||
const isActiveServer = (await getActiveServerUrl()) === serverUrl; | ||
if (isActiveServer && tabletDevice && initialChannelId === currentChannelId) { | ||
await markChannelAsRead(serverUrl, initialChannelId, false, groupLabel); | ||
markChannelAsViewed(serverUrl, initialChannelId); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably fine, but I wonder why we are removing this.
@yasserfaraazkhan I know you have so much on your plate already. This is another PR that I would love to get through for 2.25.0. In config.json, please enable telemetry You should be able to see debug output once you've run the app with that setting enabled. Or use CHarles Proxy to see mobile telemetry sending data to the server, whether it accepts it or not. If you can spin up a server from my changes here: mattermost/mattermost#29601, you should be able to kill 2 birds with 1 stone. The request will show up in log, and if you connect prometheus server to it, you should be able to see the output. |
/update-branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verified using /cloud instance and enabling diagnostics.
checked channel switching, opening threads flow. Working as expected
5f37436
to
57a432c
Compare
Cherry pick is scheduled. |
Error trying doing the automated Cherry picking. Please do this manually
|
* Network metrics * update network-client ref * fix unit tests * missing catch error parsing * Replace API's to fetch all teams and channels for a user Update groupLabel to use a type definition Include effective Latency and Average Speed in the metrics Include parallel and sequential request counts * update network library and fix tests * feat(MM-61865): send network request telemetry data to the server (#8436) * feat(MM-61865): send network info telemetry data * unit testing * fix latency vs effectiveLatency * cleanup test * fix failing tests * fix spelling error * fix failing test * more cleanup * fix: improve parallel requests calculation (#8439) * fix: improve parallel requests calculation * fix test issue * multiple parallelGroups testing * calculateAverageSpeedWithCategories test * categorizedRequests in it's own describe * clean up duplicate tests * check for when groupLabel is non-existent * a case when groupLabel is non-existent * more testing with effective latency. * resolveAndFlattenModels with a capital F * add try..catch within resolveAndFlattenModels * update groupLabel to fix failing lint * fix linting issue due to unknown group label * forgot to push changes * fix the indentation problem again * add env var option for COLLECT_NETWORK_METRICS --------- Co-authored-by: Rahim Rahman <[email protected]> Co-authored-by: Mattermost Build <[email protected]>
* Network metrics * update network-client ref * fix unit tests * missing catch error parsing * Replace API's to fetch all teams and channels for a user Update groupLabel to use a type definition Include effective Latency and Average Speed in the metrics Include parallel and sequential request counts * update network library and fix tests * feat(MM-61865): send network request telemetry data to the server (#8436) * feat(MM-61865): send network info telemetry data * unit testing * fix latency vs effectiveLatency * cleanup test * fix failing tests * fix spelling error * fix failing test * more cleanup * fix: improve parallel requests calculation (#8439) * fix: improve parallel requests calculation * fix test issue * multiple parallelGroups testing * calculateAverageSpeedWithCategories test * categorizedRequests in it's own describe * clean up duplicate tests * check for when groupLabel is non-existent * a case when groupLabel is non-existent * more testing with effective latency. * resolveAndFlattenModels with a capital F * add try..catch within resolveAndFlattenModels * update groupLabel to fix failing lint * fix linting issue due to unknown group label * forgot to push changes * fix the indentation problem again * add env var option for COLLECT_NETWORK_METRICS --------- Co-authored-by: Rahim Rahman <[email protected]> Co-authored-by: Mattermost Build <[email protected]>
Summary
More improvements on network metrics include speed, effective latency and more importantly some changes with the entry fetching endpoints.
Ticket Link
https://mattermost.atlassian.net/browse/MM-61878
Release Note